AITopics | universal speech model

Collaborating Authors

universal speech model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

Huang, W. Ronny, Allauzen, Cyril, Chen, Tongzhou, Gupta, Kilol, Hu, Ke, Qin, James, Zhang, Yu, Wang, Yongqiang, Chang, Shuo-Yiin, Sainath, Tara N.

arXiv.org Artificial IntelligenceJan-23-2024

In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck. We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware. Our approach combines the Universal Speech Model (USM) and the PaLM 2 language model in per-segment scoring mode, achieving an average relative WER improvement across all languages of 10.8% on FLEURS and 3.6% on YouTube captioning. Furthermore, our comprehensive ablation study analyzes key parameters such as LLM size, context length, vocabulary size, fusion methodology. For instance, we explore the impact of LLM size ranging from 128M to 340B parameters on ASR performance. This study provides valuable insights into the factors influencing the effectiveness of practical large-scale LM-fused speech recognition systems.

dependence, hypothesis, language model, (13 more...)

arXiv.org Artificial Intelligence

2401.12789

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model

Soltau, Hagen, Shafran, Izhak, Ottenwess, Alex, Duffy, Joseph R. JR, Utianski, Rene L., Barnard, Leland R., Stricker, John L., Wiepert, Daniela, Jones, David T., Botha, Hugo

arXiv.org Artificial IntelligenceOct-16-2023

We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders. We combine this classifier with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings. Our model compresses long sequences into a small set of class-specific latent representations and a factorized projection is used to predict different attributes of the disordered input speech. The benefit of our approach is that it allows us to model different regions of the input for different classes and is at the same time data efficient. We evaluated the proposed model extensively on a curated corpus from the Mayo Clinic. Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%. With limited task-specific data, we find that pretraining is important and surprisingly pretraining with the unrelated automatic speech recognition (ASR) task is also beneficial. Encodings from the middle layers provide a mix of both acoustic and phonetic information and achieve best prediction results compared to just using the final layer encodings (83.1% vs. 79.6%). The results are promising and with further refinements may help clinicians detect speech abnormalities without needing access to highly specialized speech-language pathologists.

detecting speech abnormality, perceiver-based sequence classifier, universal speech model

arXiv.org Artificial Intelligence

2310.1301

Genre: Research Report (0.40)

Industry: Health & Medicine (0.87)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)

Add feedback

Google AI Updates Universal Speech Model to Scale Automatic Speech Recognition Beyond 100 Languages

#artificialintelligenceMar-17-2023, 07:56:57 GMT

Google AI have recently unveiled a new update for their Universal Speech Model (USM), to support the 1,000 Languages Initiative. The new model performs better than OpenAI Whisper for all segments of automation speech recognition. A universal speech model is a machine learning model trained to recognize and understand spoken language across different languages and accents. USM is a family of state-of-the-art speech models with 2B parameters trained on 12 million hours of speech and 28 billion sentences of text, spanning 300 languages. According to Google, USM can conduct automatic speech recognition (ASR) on under-resourced languages like Amharic, Cebuano, Assamese, and Azerbaijani to frequently spoken languages like English and Mandarin.

automatic speech recognition, scale automatic speech recognition, universal speech model, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Google Develops AI Model Capable Of Understanding Over 1,000 Languages via @techshotsapp

#artificialintelligenceMar-7-2023, 07:05:32 GMT

Google has developed an AI model called the Universal Speech Model that can recognize and transcribe speech in over 1,000 languages, including many endangered ones. The model was trained on a dataset of 1.2 M hours of speech and can adapt to new languages quickly with minimal data. The Universal Speech Model has the potential to revolutionize speech-to-text technology and enable better communication and preservation of endangered languages.

techshotsapp, universal speech model

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback